Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
نویسندگان
چکیده
Support Vector Machines present an interesting and effective approach to solve automated classification tasks. Although it only handles binary and supervised problems by nature, it has been transformed into multiclass and semi-supervised approaches in several works. A previous study on supervised and semi-supervised SVM classification over binary taxonomies showed how the latter clearly outperforms the former, proving the suitability of unlabeled data for the learning phase in this kind of tasks. However, the suitability of unlabeled data for multiclass tasks using SVM has never been tested before. In this work, we present a study on whether unlabeled data could improve results for multiclass web page classification tasks using Support Vector Machines. As a conclusion, we encourage to rely only on labeled data, both for improving (or at least equaling) performance and for reducing the computational cost.
منابع مشابه
Comparativa de Aproximaciones a SVM Semisupervisado Multiclase para Clasificación de Páginas Web
In this paper we present a study for semi-supervised multiclass web page classification using SVM. We propose not only combining binary semi-supervised classifiers, but also multiclass supervised ones. Our experiments show great performance for the latter method, where ignoring unlabeled documents could be better for some cases, using only labeled documents for the learning task, directly based...
متن کاملThe Construction of Support Vector Machine Classifier Using the Firefly Algorithm
The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not conside...
متن کاملBoosting for multiclass semi-supervised learning
Supervised learning methods are effective when there are sufficient labeled instances. In many applications, such as object detection, document and web-page categorization, labeled instances however are difficult, expensive, or time consuming to obtain because they require empirical research or experienced human annotators. Semi-supervised learning algorithms use not only the labeled data but a...
متن کاملWeb Page Classification Using Relational Learning Algorithm and Unlabeled Data
Applying relational tri-training (R-tri-training for short) to web page classification is investigated in this paper. R-tri-training, as a new relational semi-supervised learning algorithm, is well suitable for learning in web page classification. The semi-supervised component of R-tritraining allows it to exploit unlabeled web pages to enhance the learning performance effectively. In addition,...
متن کاملIdentifying Efficient Kernel Function in Multiclass Support Vector Machines
Support vector machine (SVM) is a kernel based novel pattern classification method that is significant in many areas like data mining and machine learning. A unique strength is the use of kernel function to map the data into a higher dimensional feature space. In training SVM, kernels and its parameters have very vital role for classification accuracy. Therefore, a suitable kernel design and it...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009